WEBVTT

00:00:01.000 --> 00:00:04.166
And thank you, the organisers, for inviting me

00:00:04.166 --> 00:00:04.866
to speak here.

00:00:05.300 --> 00:00:08.500
I'm honoured and humbled to speak among my

00:00:08.500 --> 00:00:13.666
community and to speak alongside people whose works

00:00:13.666 --> 00:00:15.800
I've been following since the beginning of my

00:00:15.800 --> 00:00:16.466
academic journey.

00:00:16.666 --> 00:00:18.500
So sincerely, thank you very much.

00:00:19.900 --> 00:00:22.300
So I wanted to talk today about some

00:00:22.300 --> 00:00:24.933
work that I've been doing on distance preserving

00:00:24.933 --> 00:00:26.400
graph machine learning.

00:00:27.000 --> 00:00:28.966
And before I start, let me just give

00:00:28.966 --> 00:00:31.166
you a picture that will sort of motivate

00:00:31.166 --> 00:00:32.733
the work that we were doing here.

00:00:33.500 --> 00:00:36.233
So the idea that we were exploring is

00:00:36.233 --> 00:00:40.566
that graphs have structural topology at all scales.

00:00:41.200 --> 00:00:43.266
So you can think at the simplest level,

00:00:43.333 --> 00:00:45.400
at the local scale, you have things like

00:00:45.400 --> 00:00:49.266
local invariance, few hop neighbourhoods, even very small

00:00:49.266 --> 00:00:51.800
things like scalars per node, like the degree

00:00:51.800 --> 00:00:52.966
of each node.

00:00:53.600 --> 00:01:00.200
And these fundamental invariance, they drive things like

00:01:00.200 --> 00:01:01.300
decentralised consensus.

00:01:01.600 --> 00:01:03.600
They drive things like message-passing convolutional neural

00:01:03.600 --> 00:01:07.500
networks by doing these successive local aggregations.

00:01:08.333 --> 00:01:09.500
Then if you go to the other end

00:01:09.500 --> 00:01:11.366
of the spectrum, you have the global scale.

00:01:11.966 --> 00:01:14.300
And you can think of, when I say

00:01:14.300 --> 00:01:16.400
global scale here, I'm thinking of, I have

00:01:16.400 --> 00:01:17.633
two nodes in my graph.

00:01:17.733 --> 00:01:18.933
They're not necessarily connected.

00:01:19.533 --> 00:01:21.966
How are they related throughout the graph?

00:01:21.966 --> 00:01:23.900
So for instance, how many paths are there

00:01:23.900 --> 00:01:24.600
between them?

00:01:25.100 --> 00:01:27.433
Or perhaps more importantly, what is the shortest

00:01:27.433 --> 00:01:29.166
path that I can take to get from

00:01:29.166 --> 00:01:30.466
one node to another node?

00:01:30.966 --> 00:01:32.900
And this is an important problem that drives

00:01:32.900 --> 00:01:39.466
things like retrieval, routing, navigation, which are very

00:01:39.466 --> 00:01:42.166
important problems in areas like robotics and operations

00:01:42.166 --> 00:01:42.666
research.

00:01:43.533 --> 00:01:45.666
And then somewhere in the middle, you have

00:01:45.666 --> 00:01:48.566
the mesoscopic scale, which is sort of like

00:01:48.566 --> 00:01:50.733
this middle scale between local and global.

00:01:51.366 --> 00:01:57.200
And this scale is usually revealed in things

00:01:57.200 --> 00:02:00.966
like multi-hop walks, effective resistance, as Antonio

00:02:00.966 --> 00:02:04.566
Ortega said in the first plenary.

00:02:05.100 --> 00:02:09.733
And this will reveal motifs, things like communities,

00:02:10.900 --> 00:02:12.300
bottlenecks, and so on.

00:02:12.566 --> 00:02:14.066
So the way I think about this is,

00:02:14.966 --> 00:02:17.866
if I know that two nodes are connected

00:02:17.866 --> 00:02:20.866
by an edge, they are sort of in

00:02:20.866 --> 00:02:22.166
this local scale together.

00:02:22.833 --> 00:02:24.400
If they're not connected by an edge, if

00:02:24.400 --> 00:02:25.700
I ask myself, OK, are they in the

00:02:25.700 --> 00:02:26.300
same community?

00:02:26.566 --> 00:02:28.066
That's sort of like the mesoscopic scale.

00:02:28.433 --> 00:02:29.700
Imagine they're not in the same community.

00:02:30.233 --> 00:02:31.066
How far are they?

00:02:31.400 --> 00:02:32.633
And this is what I think of as

00:02:32.633 --> 00:02:33.866
the global scale.

00:02:34.000 --> 00:02:37.900
So this is just some taxonomy for what

00:02:37.900 --> 00:02:40.400
I mean when I'm referring to these different

00:02:40.400 --> 00:02:40.733
scales.

00:02:42.566 --> 00:02:44.933
And then the question that we were interested

00:02:44.933 --> 00:02:47.766
in with my PhD student, Milet, by the

00:02:47.766 --> 00:02:48.833
way, I should have said, this is in

00:02:48.833 --> 00:02:52.366
collaboration with Shovik Dara from Georgia Tech and

00:02:52.366 --> 00:02:54.233
Meher Chaitanya from KTH.

00:02:55.700 --> 00:02:59.366
So what we were interested in understanding was,

00:02:59.866 --> 00:03:03.866
can existing graph signal processing machine learning methods

00:03:03.866 --> 00:03:05.533
embed this topology at all scales?

00:03:06.266 --> 00:03:08.966
So when it comes to local embeddings, GNNs,

00:03:09.066 --> 00:03:10.800
signal processing, do this beautifully.

00:03:11.066 --> 00:03:14.100
This is beautifully handled by graph SP and

00:03:14.100 --> 00:03:16.233
NML, at least in its traditional form.

00:03:16.866 --> 00:03:19.400
Most graph signal networks are local.

00:03:19.933 --> 00:03:23.600
So convolutions are defined by these weighted sums

00:03:23.600 --> 00:03:25.500
of local aggregations.

00:03:25.666 --> 00:03:27.200
And this is the case also with aggregation

00:03:27.200 --> 00:03:28.900
message passing networks.

00:03:29.100 --> 00:03:30.933
These are all just different names for the

00:03:30.933 --> 00:03:31.366
same thing.

00:03:32.066 --> 00:03:34.666
So this, I guess we can safely say

00:03:35.066 --> 00:03:37.800
it's handled in particular by our community here.

00:03:39.066 --> 00:03:42.333
But when it comes to the global and

00:03:42.333 --> 00:03:44.533
the mesoscopic scale, there are some challenges.

00:03:44.966 --> 00:03:48.166
So for instance, when it comes to finding

00:03:48.166 --> 00:03:51.400
global embeddings, for instance, shortest path distances between

00:03:51.400 --> 00:03:54.900
two nodes in a graph, this is difficult

00:03:54.900 --> 00:03:57.633
to handle with local architectures like GNNs.

00:03:57.766 --> 00:04:00.033
You would have to feel the number of

00:04:00.033 --> 00:04:03.500
hops equal to the diameter, which is typically

00:04:03.500 --> 00:04:04.600
not something that you want.

00:04:05.900 --> 00:04:07.800
This can be handled in theory by graph

00:04:07.800 --> 00:04:11.300
transformers, in particular dense, so-called dense graph

00:04:11.300 --> 00:04:13.766
transformers, which are ones that allow every node

00:04:13.766 --> 00:04:16.333
to attend to every other node, even without

00:04:16.333 --> 00:04:16.900
an edge.

00:04:18.366 --> 00:04:20.566
However, we know that this is not very

00:04:20.566 --> 00:04:24.266
efficient because this dense attention scales as O

00:04:24.266 --> 00:04:24.800
of N squared.

00:04:25.200 --> 00:04:26.766
You have to fill this dense and by

00:04:26.766 --> 00:04:27.300
end matrix.

00:04:28.200 --> 00:04:30.733
And in particular, when thinking about the shortest

00:04:30.733 --> 00:04:32.366
path problem, which is what we're gonna be

00:04:32.366 --> 00:04:34.766
talking about in a second, if you were

00:04:34.766 --> 00:04:36.766
to store all the shortest distance from a

00:04:36.766 --> 00:04:39.133
node to all the other nodes in the

00:04:39.133 --> 00:04:41.166
graph, that requires n-dimensional embeddings.

00:04:41.866 --> 00:04:45.800
And so you can't, for instance, transfer this

00:04:45.800 --> 00:04:47.033
architecture to a larger graph.

00:04:47.100 --> 00:04:48.400
You have no size generalisation.

00:04:49.100 --> 00:04:51.866
And moreover, I mean, you could say, okay,

00:04:51.933 --> 00:04:54.600
I don't wanna find all shortest path distances.

00:04:54.900 --> 00:04:57.066
I wanna find the top K, but the

00:04:57.066 --> 00:04:58.900
top K won't match across nodes.

00:04:59.266 --> 00:05:02.066
So you'll have this problem of the different

00:05:02.066 --> 00:05:05.933
embedding dimensions are gonna correspond to different nodes,

00:05:06.100 --> 00:05:08.366
and that's very bad for GNNs.

00:05:08.433 --> 00:05:10.133
It really hurts generalisation.

00:05:10.666 --> 00:05:14.666
Okay, so this is one area where both

00:05:14.666 --> 00:05:16.900
GNNs and transformers struggle.

00:05:17.400 --> 00:05:22.100
When it comes to mesoscopic embeddings, extracting these

00:05:22.100 --> 00:05:25.466
mesoscopic embeddings, extracting this mesoscopic, right?

00:05:25.533 --> 00:05:26.833
So we know that, for instance, if you

00:05:26.833 --> 00:05:28.666
have well-behaved graphs, like dense graphs, you

00:05:28.666 --> 00:05:32.566
can extract things like communities by using spectral

00:05:32.566 --> 00:05:37.400
filters, spectral GNNs, of which convolutional filters and

00:05:37.400 --> 00:05:39.233
GNNs are a type, right?

00:05:39.866 --> 00:05:42.200
But they can be challenging to learn and

00:05:42.200 --> 00:05:43.766
practise in some settings.

00:05:44.366 --> 00:05:48.166
And for instance, I have, it's a little

00:05:48.166 --> 00:05:50.133
bit small, but the second reference there on

00:05:50.133 --> 00:05:51.700
the slide is a paper that we wrote

00:05:51.700 --> 00:05:54.366
a few years ago for ICASP, where we

00:05:54.366 --> 00:05:57.533
showed that GNNs are a little bit better

00:05:57.533 --> 00:06:00.566
than spectral embeddings at finding communities in sparse

00:06:00.566 --> 00:06:04.533
graphs, because they essentially, in sparse graphs, like

00:06:04.533 --> 00:06:08.966
the top K eigenvectors are not as informative

00:06:08.966 --> 00:06:10.300
as in dense graphs, right?

00:06:10.600 --> 00:06:12.666
But what GNNs do well is they interpolate

00:06:12.666 --> 00:06:14.800
the noise coming from the other eigenvectors.

00:06:15.400 --> 00:06:16.866
However, the problem is that when you wanna

00:06:16.866 --> 00:06:18.966
transfer that GNN to other graphs, even in

00:06:18.966 --> 00:06:23.466
the same sort of random graph family, this

00:06:23.466 --> 00:06:25.666
noise interpolation is not very transferable.

00:06:25.933 --> 00:06:28.100
And so you have some challenges, right?

00:06:28.133 --> 00:06:30.300
Even though this is feasible, you need large

00:06:30.300 --> 00:06:33.400
enough receptive field, and there are some challenges

00:06:33.400 --> 00:06:36.466
associated, for instance, with sparse graphs.

00:06:36.466 --> 00:06:39.766
The idea here was to try to, in

00:06:39.766 --> 00:06:43.566
order to try to address these issues, was

00:06:43.566 --> 00:06:46.066
to try to preserve the appropriate metric for

00:06:46.066 --> 00:06:47.266
each scale, right?

00:06:47.366 --> 00:06:50.866
So when you talk about global scale, I've

00:06:50.866 --> 00:06:52.866
been talking a lot about distances between nodes,

00:06:52.933 --> 00:06:53.100
right?

00:06:53.800 --> 00:06:56.000
And in particular, the shortest path distance is

00:06:56.000 --> 00:06:57.300
a valid metric, right?

00:06:57.366 --> 00:07:00.500
So what we will do here is I

00:07:00.500 --> 00:07:04.266
will show you this class of algorithms that

00:07:04.266 --> 00:07:06.433
I think has been studied for a good

00:07:06.433 --> 00:07:09.966
while, and especially in these web conf community

00:07:09.966 --> 00:07:10.566
and so on.

00:07:12.600 --> 00:07:14.966
And these algorithms, what they do is they

00:07:14.966 --> 00:07:18.100
try to, in a local step, learn some

00:07:18.100 --> 00:07:20.533
sort of embedding, and then in a global

00:07:20.533 --> 00:07:22.900
step, make sure that when you compute distances

00:07:22.900 --> 00:07:25.600
between embeddings, they sort of replicate shortest path

00:07:25.600 --> 00:07:26.000
distances.

00:07:26.533 --> 00:07:30.100
And we will see that these are worst

00:07:30.100 --> 00:07:31.866
case in nature, so they're a little conservative.

00:07:32.433 --> 00:07:35.133
And so with me and my collaborator, Shavik

00:07:35.133 --> 00:07:37.100
Dara, what we did was we analysed these

00:07:37.100 --> 00:07:39.433
algorithms in homogeneous random graphs, which is a

00:07:39.433 --> 00:07:42.500
much more realistic class of graphs, to try

00:07:42.500 --> 00:07:44.866
to obtain sort of average case distortion guarantees.

00:07:45.600 --> 00:07:47.166
So this is what we're gonna do first,

00:07:47.233 --> 00:07:49.233
and then I'm also gonna show some empirical

00:07:49.233 --> 00:07:52.533
results where we replace this local step that

00:07:52.533 --> 00:07:55.266
I'll be more clear about in a few

00:07:55.266 --> 00:07:57.166
moments with a GNN.

00:07:57.466 --> 00:07:59.266
And this allows us to have things like

00:07:59.266 --> 00:08:04.266
transferability, all the nice properties of GNNs.

00:08:04.866 --> 00:08:06.266
So this will be the first part, and

00:08:06.266 --> 00:08:08.333
then in the second part, we're going to

00:08:08.333 --> 00:08:09.733
adjust the mesoscopic scale.

00:08:10.266 --> 00:08:12.933
So here, what we're gonna do is we're

00:08:12.933 --> 00:08:14.933
gonna use this type of embedding called the

00:08:14.933 --> 00:08:16.600
maximum adjacency search embeddings.

00:08:16.733 --> 00:08:20.900
These are inspired by cascades in opinion dynamics.

00:08:21.666 --> 00:08:23.466
These are essentially the work of my other

00:08:23.466 --> 00:08:25.500
collaborator, Meher, who's from this area.

00:08:26.433 --> 00:08:29.100
And what these embeddings do is they count

00:08:29.866 --> 00:08:31.800
occurrences of nodes in multi-hop.

00:08:32.000 --> 00:08:34.100
What we will use these embeddings for is

00:08:34.100 --> 00:08:36.266
to rewire the graph such that we are

00:08:36.266 --> 00:08:39.100
capturing not just these local connections in the

00:08:39.100 --> 00:08:42.600
graph, but also these maximally adjacent connections.

00:08:42.800 --> 00:08:43.966
So there are nodes that might not be

00:08:43.966 --> 00:08:46.200
connected by an edge, but they're in multiple

00:08:46.200 --> 00:08:50.266
kind of multi-hop paths, and that means

00:08:50.266 --> 00:08:51.800
they're maximally connected somehow.

00:08:53.100 --> 00:08:55.500
And our goal will be to rewire the

00:08:55.500 --> 00:08:59.166
graph, pass this graph through any architecture like

00:08:59.166 --> 00:09:02.066
a GNN or a graph transformer, and hopefully

00:09:02.066 --> 00:09:04.933
through this rewiring, be able to extract more

00:09:04.933 --> 00:09:06.366
of this mesoscopic structure.

00:09:06.733 --> 00:09:10.000
And here we'll see that the valid metric,

00:09:10.166 --> 00:09:12.333
like the appropriate metric is the effective resistance,

00:09:12.633 --> 00:09:15.300
and that we're able to lower bound the

00:09:15.300 --> 00:09:19.266
effective resistance of these rewired edges that we

00:09:19.266 --> 00:09:20.100
will add.

00:09:21.800 --> 00:09:24.233
Okay, so let's start with the first part.

00:09:25.833 --> 00:09:28.266
So the problem that we are trying to

00:09:28.266 --> 00:09:31.966
address here is to approximate shortest path distances,

00:09:32.200 --> 00:09:32.300
right?

00:09:32.366 --> 00:09:33.533
This is sort of like a sub-problem

00:09:33.533 --> 00:09:35.233
of the shortest path problem.

00:09:35.366 --> 00:09:36.166
It's a little bit simpler.

00:09:36.300 --> 00:09:37.966
You're not trying to find the shortest path.

00:09:38.000 --> 00:09:39.733
You're just trying to find the shortest distance.

00:09:41.066 --> 00:09:42.966
We know that one algorithm that you can

00:09:42.966 --> 00:09:46.166
use to find shortest path problems in particular,

00:09:46.466 --> 00:09:50.533
but the problem is that finding exact all

00:09:50.533 --> 00:09:53.500
-pairs distances using this algorithm and even simplifications

00:09:53.500 --> 00:09:56.300
of it is pretty prohibitive at scale, right?

00:09:56.400 --> 00:09:58.400
So it scales as O of N times

00:09:58.400 --> 00:10:00.000
N plus M, where N is the number

00:10:00.000 --> 00:10:01.300
of nodes and M is the number of

00:10:01.300 --> 00:10:02.100
edges in the graph.

00:10:03.300 --> 00:10:06.300
And so one idea that has been explored

00:10:06.300 --> 00:10:09.366
a lot by people working on this problem

00:10:09.900 --> 00:10:12.266
is this idea of embedding nodes in some

00:10:12.266 --> 00:10:14.900
space such that the distance between the embeddings

00:10:14.900 --> 00:10:17.366
of the nodes in this space approximate the

00:10:17.366 --> 00:10:20.466
graph distance, meaning the shortest path distance, okay?

00:10:20.966 --> 00:10:23.000
And then the quality of these embeddings is

00:10:23.000 --> 00:10:25.466
measured by a multiplicative distortion, right?

00:10:25.533 --> 00:10:28.433
So if D hat here is the approximation,

00:10:28.966 --> 00:10:30.800
we want it to be lower and upper

00:10:30.800 --> 00:10:31.900
bounded like this.

00:10:32.200 --> 00:10:34.833
We say that this approximation has one plus

00:10:34.833 --> 00:10:38.000
minus epsilon distortion if it's lower bounded by

00:10:38.000 --> 00:10:40.666
one minus epsilon, the actual shortest path distance,

00:10:41.266 --> 00:10:43.200
and upper bounded by one plus epsilon, the

00:10:43.200 --> 00:10:45.800
actual shortest path distance, okay?

00:10:46.166 --> 00:10:48.066
Steps, a local step and a global step.

00:10:48.333 --> 00:10:50.600
The local step is typically done offline.

00:10:50.600 --> 00:10:52.766
So we do this step for all the

00:10:52.766 --> 00:10:53.533
nodes in advance.

00:10:54.166 --> 00:10:55.900
And the way it works is you sample,

00:10:56.100 --> 00:10:57.266
so you're given your graph.

00:10:57.566 --> 00:10:59.600
Here I'm just omitting the edges just to

00:10:59.600 --> 00:11:02.066
make the image clearer.

00:11:02.266 --> 00:11:03.966
But what you do is you sample landmark

00:11:03.966 --> 00:11:09.500
sets, landmark sets, S0 through SR. And these

00:11:09.500 --> 00:11:11.866
are sets of seed or landmark nodes.

00:11:12.100 --> 00:11:14.700
So here I'm representing them by the orange

00:11:14.700 --> 00:11:15.666
nodes.

00:11:16.366 --> 00:11:18.266
And then what you do is you start

00:11:18.266 --> 00:11:20.266
the distance of every node in the graph

00:11:20.266 --> 00:11:23.066
to the nearest landmark per set, right?

00:11:23.133 --> 00:11:25.700
So each node will have an r-dimensional

00:11:25.700 --> 00:11:29.133
embedding where the jth entry of that embedding

00:11:29.133 --> 00:11:33.166
is the minimum distance between you and an

00:11:33.166 --> 00:11:35.166
element of that set, okay?

00:11:37.500 --> 00:11:39.400
You do this offline for all nodes.

00:11:39.566 --> 00:11:41.000
Typically the way people do this is doing

00:11:41.000 --> 00:11:44.200
breadth-first search from the seeds or landmarks.

00:11:45.166 --> 00:11:47.366
And then you store all these distances.

00:11:47.366 --> 00:11:49.000
You get your node embeddings.

00:11:49.166 --> 00:11:51.800
You can have the global step in a

00:11:51.800 --> 00:11:53.966
query-based fashion, right?

00:11:54.000 --> 00:11:55.733
So I can come to my server or

00:11:55.733 --> 00:11:58.600
my computer and say, I want to approximate

00:11:58.600 --> 00:12:00.300
the shortest path between U and V.

00:12:01.066 --> 00:12:02.766
So the way you do this is you

00:12:02.766 --> 00:12:04.966
look at these embeddings and you compute some

00:12:04.966 --> 00:12:06.900
sort of approximation of the distance based on

00:12:06.900 --> 00:12:07.733
the triangle inequality.

00:12:10.266 --> 00:12:13.400
So there are two kind of classical ways

00:12:13.400 --> 00:12:14.000
to do this.

00:12:14.766 --> 00:12:17.000
The first way gives you a lower bound

00:12:17.500 --> 00:12:21.133
for the actual shortest path distance, right?

00:12:21.300 --> 00:12:22.966
So I'm writing the lower bound as the

00:12:22.966 --> 00:12:24.400
underline.

00:12:25.133 --> 00:12:26.700
And then here, I mean, the local step

00:12:26.700 --> 00:12:28.500
is exactly as I described, right?

00:12:28.566 --> 00:12:31.700
You're creating that embedding by distances from each

00:12:31.700 --> 00:12:33.600
node to the closest seed in each one

00:12:33.600 --> 00:12:34.066
of the sets.

00:12:34.500 --> 00:12:36.333
And then in the global step, and this

00:12:36.333 --> 00:12:38.966
is called Lugan's algorithm, what you do is

00:12:38.966 --> 00:12:42.300
you use the inverse triangle inequality and you

00:12:42.300 --> 00:12:44.600
obtain this lower bound, but essentially taking the

00:12:44.600 --> 00:12:46.866
L infinity norm of the difference between the

00:12:46.866 --> 00:12:47.266
embeddings.

00:12:47.533 --> 00:12:49.300
So what this is doing in practise is

00:12:49.300 --> 00:12:54.166
it's trying to find the maximum distance between

00:12:54.166 --> 00:12:56.433
the embeddings coordinate-wise, right?

00:12:56.500 --> 00:12:58.500
So that you're trying to find the set

00:12:58.900 --> 00:13:01.866
that achieves this maximal distance.

00:13:04.800 --> 00:13:07.600
And then there's another approach that was proposed

00:13:07.600 --> 00:13:09.866
in this algorithm by, in this paper, by

00:13:09.866 --> 00:13:14.266
Dasarma and others that gives you an upper

00:13:14.266 --> 00:13:14.966
bound, right?

00:13:14.966 --> 00:13:19.000
So it gives this D overline that upper

00:13:19.000 --> 00:13:20.666
bounds the actual shortest path distance.

00:13:21.000 --> 00:13:24.100
And here, seed nodes, what we also do

00:13:24.100 --> 00:13:27.500
is we store the actual index of the

00:13:27.500 --> 00:13:30.966
seed in SJ that achieves this distance, right?

00:13:31.000 --> 00:13:33.000
So we want to know the size, you

00:13:33.000 --> 00:13:36.333
know, like the size of the distance of

00:13:36.333 --> 00:13:38.500
the node to each set, which is the

00:13:38.500 --> 00:13:41.566
node in that sense that is closest to

00:13:41.566 --> 00:13:42.500
you, okay?

00:13:43.100 --> 00:13:46.800
So using these two sort of vectors, right?

00:13:46.933 --> 00:13:48.933
One in our dimensional vector and the other

00:13:48.933 --> 00:13:51.200
a scalar per node.

00:13:51.566 --> 00:13:54.100
When we do the global step, we use

00:13:54.100 --> 00:13:56.900
the triangle inequality, not the inverse, the reverse

00:13:56.900 --> 00:14:01.100
triangle inequality now to upper bound the shortest

00:14:01.100 --> 00:14:04.766
path distance by the minimum among the sum

00:14:04.766 --> 00:14:09.800
between OXUJ and FXPJ such that the seeds

00:14:09.800 --> 00:14:12.700
that achieve the minimum distance in the seed

00:14:12.700 --> 00:14:14.666
set SJ match, right?

00:14:14.800 --> 00:14:17.766
So this is what I have here, but

00:14:17.766 --> 00:14:20.400
to approximate the distance between this U1 and

00:14:20.400 --> 00:14:22.733
U2 here, I need to have a common

00:14:22.733 --> 00:14:23.833
seed, right?

00:14:24.366 --> 00:14:26.300
And so that's why you need this extra

00:14:26.300 --> 00:14:27.400
embedding here.

00:14:28.500 --> 00:14:28.966
What's wrong?

00:14:29.600 --> 00:14:30.400
I'm sorry.

00:14:31.566 --> 00:14:32.766
Yes, I'll stop moving.

00:14:33.666 --> 00:14:33.900
Okay.

00:14:36.066 --> 00:14:40.300
All right, so these two are classical algorithms.

00:14:40.300 --> 00:14:42.200
And like I said, these algorithms don't mean

00:14:42.200 --> 00:14:46.000
anything if they don't have distortion guarantees, right?

00:14:46.066 --> 00:14:48.633
So we need a distortion guarantee both for

00:14:48.633 --> 00:14:50.433
Bourgan's algorithm and the Sarma's algorithm.

00:14:51.100 --> 00:14:53.500
Bourgan's algorithm gives us a lower bound.

00:14:53.766 --> 00:14:56.300
And this theorem here, which is based on

00:14:56.300 --> 00:14:59.700
Bourgan's embedding theorem, the lower bound distortion.

00:15:00.966 --> 00:15:04.966
This theorem is actually, I mean, doesn't necessarily

00:15:04.966 --> 00:15:07.300
apply just to the shortest from this very

00:15:07.300 --> 00:15:10.366
important paper by Bourgan and then later by

00:15:10.366 --> 00:15:14.333
Matuszek, where they essentially show the distortion of

00:15:14.333 --> 00:15:17.100
embedding metric spaces into Hilbert spaces, right?

00:15:17.100 --> 00:15:19.300
Because you always lose something when you do

00:15:19.300 --> 00:15:19.566
that.

00:15:19.900 --> 00:15:22.900
And this is basically that theorem applied to

00:15:22.900 --> 00:15:24.166
this particular problem.

00:15:25.466 --> 00:15:28.700
So what the theorem says is that, given

00:15:28.700 --> 00:15:33.233
a large enough graph, for any constant C

00:15:33.233 --> 00:15:36.466
that you fix, there exist optimal embeddings, which

00:15:36.466 --> 00:15:38.666
are these X star embeddings that I have

00:15:38.666 --> 00:15:43.266
here, with dimension at least N to the

00:15:43.266 --> 00:15:46.166
one over C log N, such that the

00:15:46.166 --> 00:15:51.100
lower bound estimator has bounded distortion, where the

00:15:51.100 --> 00:15:53.366
distortion is given by that one over two

00:15:53.366 --> 00:15:54.100
C minus one.

00:15:55.000 --> 00:15:55.100
Okay?

00:15:55.800 --> 00:15:57.766
So for instance, if you wanted to get

00:15:57.766 --> 00:16:02.500
distortion of say, order, let's see here.

00:16:04.033 --> 00:16:06.333
So if we do 1.5, this would

00:16:06.333 --> 00:16:07.133
be half, right?

00:16:07.133 --> 00:16:09.300
So you would get something like to the

00:16:09.300 --> 00:16:11.900
one, like one over 1.5 log N

00:16:11.900 --> 00:16:15.600
embeddings that gives you a distortion of one

00:16:15.600 --> 00:16:16.000
half there.

00:16:17.733 --> 00:16:18.133
Okay.

00:16:21.433 --> 00:16:24.000
Then, besides having a distortion guarantee for the

00:16:24.000 --> 00:16:24.633
lower bound, right?

00:16:24.700 --> 00:16:27.266
So for Bourgan's algorithm, we also have distortion

00:16:27.266 --> 00:16:30.400
guarantees for the upper bound from that more

00:16:30.400 --> 00:16:31.166
recent paper.

00:16:31.833 --> 00:16:34.733
And in that paper, besides proposing this algorithm,

00:16:34.900 --> 00:16:37.633
they also provide guarantees whenever the seed sets

00:16:37.633 --> 00:16:40.933
have size two to the J, okay?

00:16:40.933 --> 00:16:43.000
With J ranging from zero to the floor

00:16:43.000 --> 00:16:44.266
of log N.

00:16:45.500 --> 00:16:47.166
And the reason why you need this construction

00:16:47.166 --> 00:16:50.633
is one of the reasons is because, remember

00:16:50.633 --> 00:16:53.000
that I said that you always need to

00:16:53.000 --> 00:16:54.100
have this common seed.

00:16:59.066 --> 00:17:01.333
You always need to have the common seed.

00:17:01.500 --> 00:17:02.800
And so you always need to have at

00:17:02.800 --> 00:17:07.000
least one set that has size one, right?

00:17:07.000 --> 00:17:11.600
Because you might not find matching seeds in

00:17:11.600 --> 00:17:11.966
the others.

00:17:12.133 --> 00:17:15.733
You have to set S zero, give you

00:17:15.733 --> 00:17:17.200
that triangle inequality approximation.

00:17:18.100 --> 00:17:19.800
And what this theorem says is that you

00:17:19.800 --> 00:17:23.366
get two C minus one distortion guarantee, right?

00:17:23.400 --> 00:17:26.633
For the upper bound, for your upper bound

00:17:26.633 --> 00:17:31.333
approximation of shortest path with embeddings X star

00:17:31.333 --> 00:17:34.100
with dimension at least N to the one

00:17:34.100 --> 00:17:34.866
over C log N.

00:17:35.166 --> 00:17:36.200
Okay, so very similar.

00:17:37.000 --> 00:17:37.966
Distortion guarantees.

00:17:39.133 --> 00:17:41.100
And same minimum embedding dimension.

00:17:43.300 --> 00:17:47.233
Now what is the issue with these distortion

00:17:47.233 --> 00:17:47.600
results?

00:17:48.000 --> 00:17:49.766
The issue with them is that they are

00:17:49.766 --> 00:17:54.200
worst case in nature in the sense that

00:17:54.200 --> 00:17:56.866
they hold even for, like imagine a graph

00:17:56.866 --> 00:17:59.300
where approximating shortest paths is very hard.

00:17:59.400 --> 00:18:01.400
For instance, a graph where you might have

00:18:01.400 --> 00:18:04.966
like a nice sort of connected blob, but

00:18:04.966 --> 00:18:07.166
you have these like very long paths, like

00:18:07.166 --> 00:18:08.500
spider legs or something, right?

00:18:08.933 --> 00:18:11.933
Approximating shortest paths in this graph is very

00:18:11.933 --> 00:18:14.600
difficult if you don't sample a seed in

00:18:14.600 --> 00:18:15.200
that path.

00:18:16.766 --> 00:18:19.500
So these guarantees hold even in that case.

00:18:19.566 --> 00:18:21.600
So you can imagine that they're pretty loose,

00:18:21.666 --> 00:18:21.866
right?

00:18:22.033 --> 00:18:24.400
And they are worst case.

00:18:25.800 --> 00:18:30.000
And additionally, because these embedding sizes are not

00:18:30.000 --> 00:18:32.566
small, this becomes prohibitive at scale.

00:18:33.133 --> 00:18:35.500
And like I said, it's pessimistic for the

00:18:35.500 --> 00:18:37.166
structured graphs that we actually have, right?

00:18:37.200 --> 00:18:38.666
And actually what I did from the two

00:18:38.666 --> 00:18:40.166
previous slides to this one is I just

00:18:40.166 --> 00:18:44.033
expressed the minimum embedding dimension in terms of

00:18:44.033 --> 00:18:46.166
epsilon, because now I wanna have one minus

00:18:46.166 --> 00:18:48.633
epsilon, one plus epsilon distortion, so clean distortion.

00:18:49.166 --> 00:18:53.000
And so I just adjusted the minimum embedding

00:18:53.000 --> 00:18:53.300
size.

00:18:56.066 --> 00:19:00.000
Okay, so what is your average case graph,

00:19:00.100 --> 00:19:00.266
right?

00:19:00.266 --> 00:19:01.866
Like what are graphs that are more realistic

00:19:01.866 --> 00:19:04.566
and closer to what you have in practise?

00:19:04.966 --> 00:19:09.233
One model that is very used to model

00:19:09.233 --> 00:19:11.800
such graphs is the inhomogeneous random graph model.

00:19:12.666 --> 00:19:16.100
And this model is basically an extension of

00:19:16.100 --> 00:19:17.700
a stochastic block model.

00:19:18.000 --> 00:19:21.966
So what happens is that in this model,

00:19:22.066 --> 00:19:24.700
each node carries a type, right?

00:19:24.766 --> 00:19:26.166
So each node belongs to sort of like

00:19:26.166 --> 00:19:27.966
a type or a family, and the edge

00:19:27.966 --> 00:19:30.300
probability will depend on the endpoint type.

00:19:30.300 --> 00:19:32.200
So you'll have a matrix D, which is

00:19:32.200 --> 00:19:33.833
called the affinity matrix that is sort of

00:19:33.833 --> 00:19:36.900
encoding the affinity between nodes of different types.

00:19:38.966 --> 00:19:41.733
And what you do is in order to

00:19:41.733 --> 00:19:44.766
compute the probability of two nodes connecting in

00:19:44.766 --> 00:19:48.433
this graph, you divide the corresponding entry of

00:19:48.433 --> 00:19:52.300
D by the number of nodes in the

00:19:52.300 --> 00:19:55.500
sort of the community of sort of like

00:19:55.500 --> 00:19:56.366
your neighbour, right?

00:19:57.100 --> 00:19:59.033
So D is not a probability, but when

00:19:59.033 --> 00:20:01.833
you take the ratio of it with the

00:20:01.833 --> 00:20:04.766
community size, your neighbour's community size, you get

00:20:04.766 --> 00:20:05.466
a probability.

00:20:06.866 --> 00:20:08.500
This model is very general, so it includes

00:20:08.500 --> 00:20:11.133
things like the stochastic block model or the

00:20:11.133 --> 00:20:12.600
Serenium when there's only one type.

00:20:13.733 --> 00:20:16.966
And so our idea is to try to

00:20:16.966 --> 00:20:20.966
derive those distortion guarantees now, not just for

00:20:20.966 --> 00:20:23.000
these inhomogeneous random graphs that are a better

00:20:23.000 --> 00:20:25.366
model of your typical graph that you encounter

00:20:25.366 --> 00:20:27.266
in the real world.

00:20:28.833 --> 00:20:31.300
So before I talk about our distortion results,

00:20:31.333 --> 00:20:32.633
I just want to give you a sense

00:20:32.633 --> 00:20:34.400
of the proof idea.

00:20:36.200 --> 00:20:39.400
So like I said earlier, in these algorithms,

00:20:39.600 --> 00:20:41.300
these local global algorithms, for sure there's a

00:20:41.300 --> 00:20:44.766
path approximation, what you do is you effectively,

00:20:45.100 --> 00:20:47.000
in order to compute the embeddings, you do

00:20:47.000 --> 00:20:48.900
breadth-first search from the seeds, right?

00:20:49.466 --> 00:20:50.900
Breadth-first search, you can think of it

00:20:50.900 --> 00:20:52.266
as local exploration, right?

00:20:52.333 --> 00:20:54.600
So I'm a node, I look at my

00:20:54.600 --> 00:20:56.233
one-hop neighbours, then my two-hop neighbours,

00:20:56.700 --> 00:20:57.333
and so on.

00:20:57.966 --> 00:21:00.966
And as a sort of like a dynamical

00:21:00.966 --> 00:21:02.266
process, you can think of this as a

00:21:02.266 --> 00:21:03.333
branching process, right?

00:21:03.366 --> 00:21:05.733
And in particular, if you have multiple types,

00:21:05.866 --> 00:21:08.633
this will be a multi-type branching process

00:21:08.633 --> 00:21:12.433
with mean matrix given by D, the affinity

00:21:12.433 --> 00:21:12.933
matrix.

00:21:14.833 --> 00:21:17.433
What you can show is that this branching

00:21:17.433 --> 00:21:19.400
process, which is basically, I mean, this is

00:21:19.400 --> 00:21:22.800
called a, it's sort of like a binomial

00:21:22.800 --> 00:21:25.700
branching process, I think it's called the Galton

00:21:25.700 --> 00:21:28.633
-Watson process, and basically, it's the process where,

00:21:28.733 --> 00:21:30.933
each node when it branches out, it has

00:21:30.933 --> 00:21:34.566
a number of sort of neighbours given by

00:21:34.566 --> 00:21:36.266
the mean of the process, which in this

00:21:36.266 --> 00:21:39.333
case here, if you have affinity matrix D,

00:21:39.700 --> 00:21:41.566
that mean is going to be lambda one,

00:21:41.666 --> 00:21:43.766
the formal eigenvalue of show.

00:21:44.800 --> 00:21:48.233
Using this branching process approximation is that the

00:21:48.233 --> 00:21:50.600
boundary of the k-hop neighbourhood of node

00:21:50.600 --> 00:21:54.333
U is of order lambda one to the

00:21:54.333 --> 00:21:55.233
k, right?

00:21:57.033 --> 00:22:00.100
And what we further show, so after showing

00:22:00.100 --> 00:22:06.266
these concentration of both the boundary size of

00:22:06.266 --> 00:22:09.833
the k-hop neighbourhood, and also the size

00:22:09.833 --> 00:22:11.800
of the entire k-hop neighbourhood, is you're

00:22:11.800 --> 00:22:15.733
able to show that typical distances concentrate using

00:22:15.733 --> 00:22:17.733
concentration of Bernoulli random variables.

00:22:18.366 --> 00:22:21.766
Essentially, DUV, where DUV is the typical distance,

00:22:22.533 --> 00:22:27.433
concentrates around log base lambda one of n,

00:22:27.933 --> 00:22:28.333
okay?

00:22:29.133 --> 00:22:32.133
And so this typical distance is what is

00:22:32.133 --> 00:22:34.333
going to give us our distortion guarantee, and

00:22:34.333 --> 00:22:36.800
concentration around it is what's gonna give us

00:22:36.800 --> 00:22:37.700
our distortion guarantee.

00:22:39.133 --> 00:22:41.900
So here is our result.

00:22:42.133 --> 00:22:44.133
So using those two algorithms for the lower

00:22:44.133 --> 00:22:46.733
bound and the upper bound, what we show

00:22:46.733 --> 00:22:50.833
is that under bound regularity conditions on the

00:22:50.833 --> 00:22:54.600
inhomogeneous random graph model, those landmark or seed

00:22:54.600 --> 00:22:58.966
-based embeddings achieve one plus minus epsilon distortion

00:22:58.966 --> 00:23:02.433
with high probability with embedding dimension, so for

00:23:02.433 --> 00:23:06.200
the lower bound, at least n to the

00:23:06.200 --> 00:23:09.000
one minus epsilon log lambda one n, and

00:23:09.000 --> 00:23:12.033
then this more complicated expression for one plus

00:23:12.033 --> 00:23:14.600
epsilon but in practise, I mean, if you

00:23:14.600 --> 00:23:19.300
forget about this minimum here, you get something

00:23:19.300 --> 00:23:23.200
like, it's like one, I think it's one

00:23:23.200 --> 00:23:24.633
plus epsilon or something like that.

00:23:27.900 --> 00:23:28.966
Yes, thank you.

00:23:29.700 --> 00:23:31.933
Yes, so, right.

00:23:32.366 --> 00:23:36.100
So the, much better, okay.

00:23:36.766 --> 00:23:39.266
So there are a few interesting things to

00:23:39.266 --> 00:23:40.366
notice here, right?

00:23:40.400 --> 00:23:41.900
Like this doesn't look much better than the

00:23:41.900 --> 00:23:43.166
embeddings that we had before, but they are

00:23:43.166 --> 00:23:43.733
a little bit better.

00:23:44.166 --> 00:23:45.400
So the first thing that you notice is

00:23:45.400 --> 00:23:47.200
that this is governed now by the spectral

00:23:47.200 --> 00:23:50.600
radius, lambda one of the affinity matrix, and

00:23:50.600 --> 00:23:53.133
this eta n here is the minimum type

00:23:53.133 --> 00:23:54.400
size exponent, right?

00:23:54.433 --> 00:23:56.533
So this is sort of controlling for not

00:23:56.533 --> 00:24:01.733
having very imbalanced types or communities, and what

00:24:01.733 --> 00:24:04.133
you see is that more balanced types will

00:24:04.133 --> 00:24:07.200
give you smaller embeddings, and that even though

00:24:07.200 --> 00:24:10.933
this still doesn't look amazing, it's polynomially smaller

00:24:10.933 --> 00:24:13.833
than those worst case, not distortion guarantees, the

00:24:13.833 --> 00:24:18.000
worst case embedding sizes by approximately n to

00:24:18.000 --> 00:24:19.000
the epsilon over two.

00:24:19.533 --> 00:24:22.200
So there's a polynomial sort of gain that

00:24:22.200 --> 00:24:25.066
you have here when you focus on homogenous

00:24:25.066 --> 00:24:25.833
or non-graphs, right?

00:24:25.866 --> 00:24:27.633
Like when you forget about worst case graphs.

00:24:30.133 --> 00:24:31.666
So that was our main result.

00:24:31.833 --> 00:24:33.233
Another thing that you can do is you

00:24:33.233 --> 00:24:35.533
can extend this to the case of continuous

00:24:35.533 --> 00:24:36.033
types.

00:24:36.766 --> 00:24:41.133
So instead of having like a finite set

00:24:41.133 --> 00:24:43.033
of types like in a stochastic block model,

00:24:43.033 --> 00:24:45.400
you could have infinitely many types, for instance,

00:24:45.600 --> 00:24:46.800
as in a graphon, right?

00:24:47.633 --> 00:24:49.233
And so now your D is no longer

00:24:49.233 --> 00:24:53.433
a matrix, it's a kernel, and this allows

00:24:53.433 --> 00:24:55.600
you to extend these results to an even

00:24:55.600 --> 00:24:59.200
larger set of graphs beyond SBM and Erdos

00:24:59.200 --> 00:25:01.900
-Renyi, such as Chung-Liu graphs and others,

00:25:02.533 --> 00:25:05.500
and the distortion bounds are very similar.

00:25:05.700 --> 00:25:08.866
The way you extend that distortion result to

00:25:08.866 --> 00:25:12.166
these graphs with infinitely many types is you

00:25:12.166 --> 00:25:16.800
sort of sandwich the kernel, the continuous kernel,

00:25:16.900 --> 00:25:18.900
by step function kernels and consider those step

00:25:18.900 --> 00:25:22.000
function kernels to be graphs with finite types.

00:25:22.300 --> 00:25:25.433
That was the sort of theoretical contribution to

00:25:25.433 --> 00:25:30.000
derive these distortion guarantees for more realistic, more

00:25:30.000 --> 00:25:31.000
average case graphs.

00:25:31.800 --> 00:25:34.400
On the numerical side, what we did was

00:25:34.400 --> 00:25:37.433
we replaced the breadth-first search step in

00:25:37.433 --> 00:25:41.533
the local step of those algorithms with a

00:25:41.533 --> 00:25:42.900
shortest path surrogate, right?

00:25:42.900 --> 00:25:44.933
So it's just trained via imitation learning to

00:25:44.933 --> 00:25:47.800
imitate breadth-first search or shortest path from

00:25:47.800 --> 00:25:48.233
the seeds.

00:25:50.200 --> 00:25:52.233
Even though these GNNs are not suited to

00:25:52.233 --> 00:25:55.733
long shortest path computation, for short shortest paths,

00:25:55.900 --> 00:25:58.966
they do reasonably enough.

00:26:00.133 --> 00:26:02.100
And what we did was we trained on

00:26:02.100 --> 00:26:02.933
small graphs, right?

00:26:02.933 --> 00:26:05.500
We trained on a small 100-node and

00:26:05.500 --> 00:26:06.533
homogeneous random graph.

00:26:06.733 --> 00:26:08.233
This was an Erdos-Renyi random graph.

00:26:08.900 --> 00:26:11.666
And then transferred to graphs up to 32

00:26:11.666 --> 00:26:13.700
times larger.

00:26:14.433 --> 00:26:15.866
And what we see here on this slide

00:26:15.866 --> 00:26:17.733
is, so when your graphs are pretty sparse,

00:26:17.866 --> 00:26:17.966
right?

00:26:18.000 --> 00:26:19.833
So when the lambda is like three and

00:26:19.833 --> 00:26:24.800
four, which is almost out of the supercritical

00:26:24.800 --> 00:26:28.766
regime for Erdos-Renyi, what you see is

00:26:28.766 --> 00:26:32.600
that breadth-first search always wins.

00:26:32.900 --> 00:26:34.533
But once you get a little bit denser,

00:26:34.633 --> 00:26:36.233
so your lambda is at five or six,

00:26:36.933 --> 00:26:40.866
eventually you get better MSC with the GNNs

00:26:40.866 --> 00:26:43.500
at a much lower cost, right?

00:26:43.600 --> 00:26:45.100
So like you have to, this is for

00:26:45.100 --> 00:26:47.033
the GNNs, so you have to make way

00:26:47.033 --> 00:26:49.900
less calculations, especially for large graph sizes because

00:26:49.900 --> 00:26:51.433
breadth-first search is exhaustive, right?

00:26:51.533 --> 00:26:54.233
GNNs only go as far as k-hops,

00:26:54.933 --> 00:26:55.600
l-layers.

00:26:58.800 --> 00:27:00.766
Then we also did this experiment for real

00:27:00.766 --> 00:27:04.400
-world networks, still using our GNNs trained on

00:27:04.400 --> 00:27:06.566
those synthetic graphs.

00:27:07.266 --> 00:27:10.600
And so what we did here was, again,

00:27:10.866 --> 00:27:15.200
we transferred those GNNs to these real-world

00:27:15.200 --> 00:27:19.200
graphs with order of 10,000 nodes.

00:27:19.733 --> 00:27:21.566
And what we saw here was that, yes,

00:27:21.666 --> 00:27:24.366
like for a given, starting from a certain

00:27:24.366 --> 00:27:26.700
graph size, you get lower MSC when you

00:27:26.700 --> 00:27:30.600
use GNNs to approximate the lower bound breadth

00:27:30.600 --> 00:27:31.166
-first search.

00:27:31.800 --> 00:27:33.966
And our main goal here was just to

00:27:33.966 --> 00:27:39.666
see, do these inhomogeneous random graphs behave in

00:27:39.666 --> 00:27:43.300
practise like your average real-world graph?

00:27:44.933 --> 00:27:48.500
And given these results, we feel like they

00:27:48.500 --> 00:27:52.466
must behave enough like these IHGs for the

00:27:52.466 --> 00:27:53.200
theory to hold.

00:27:54.966 --> 00:27:57.966
Okay, so this is the end of part

00:27:57.966 --> 00:27:58.266
one.

00:27:58.900 --> 00:28:01.000
I guess I can take questions now if

00:28:01.000 --> 00:28:03.433
there are any questions before I move on

00:28:03.433 --> 00:28:04.200
to part two.

00:28:08.066 --> 00:28:11.600
So the question I saw here was how

00:28:11.600 --> 00:28:12.800
to select the seed sets.

00:28:13.100 --> 00:28:16.200
So in practise, you just sample them at

00:28:16.200 --> 00:28:20.033
random, but it's an important problem, right?

00:28:20.066 --> 00:28:22.200
And you can find smarter ways to do

00:28:22.200 --> 00:28:22.300
it.

00:28:22.333 --> 00:28:24.600
So for instance, I think the best way

00:28:24.600 --> 00:28:26.100
to do it is to sample seeds with

00:28:26.100 --> 00:28:27.333
high centrality, right?

00:28:27.366 --> 00:28:30.633
Because they're gonna be not just very locally

00:28:30.633 --> 00:28:32.266
connected, but very central in the graph.

00:28:33.333 --> 00:28:39.200
Okay, so I'll move on.

00:28:39.566 --> 00:28:42.833
So in the second part, like I mentioned

00:28:42.833 --> 00:28:44.400
earlier, what we're gonna be doing is we're

00:28:44.400 --> 00:28:47.033
going to try to rewire the graphs such

00:28:47.033 --> 00:28:50.966
as to preserve or exploit the mesoscopic structure

00:28:50.966 --> 00:28:54.166
a little bit better in signal processing and

00:28:54.166 --> 00:28:56.100
machine learning pipelines.

00:28:57.633 --> 00:29:00.133
And the reason for that is because both

00:29:00.133 --> 00:29:03.666
GNNs and graph transformers can struggle at this

00:29:03.666 --> 00:29:05.200
mesoscopic scale, right?

00:29:05.200 --> 00:29:09.066
So graph neural networks or convolutional message-passing

00:29:09.066 --> 00:29:11.566
GNNs, they aggregate over k-hop neighbourhoods where

00:29:11.566 --> 00:29:13.266
I typically want k to be small.

00:29:14.200 --> 00:29:16.533
So this receptive field is local.

00:29:17.900 --> 00:29:20.166
And like I mentioned earlier, in some situations,

00:29:20.266 --> 00:29:24.100
you might have a hard time accessing communities

00:29:24.100 --> 00:29:25.566
and bottlenecks and that sort of thing.

00:29:26.800 --> 00:29:29.800
People sometimes call this over-smoothing, over-squashing,

00:29:30.000 --> 00:29:30.166
right?

00:29:30.800 --> 00:29:33.066
And then when it comes to graph transformers

00:29:33.066 --> 00:29:37.500
with dense attention, they have, they sort of,

00:29:37.766 --> 00:29:39.100
I mean, if you're not gonna use the

00:29:39.100 --> 00:29:42.100
graph structure, right, to compute your attention pairs,

00:29:42.233 --> 00:29:44.533
you sort of lose that graph prior that

00:29:44.533 --> 00:29:45.900
makes GNNs work so well.

00:29:46.366 --> 00:29:48.766
And moreover, you have this O of N

00:29:48.766 --> 00:29:51.300
squared attention that is very prohibitive for large

00:29:51.300 --> 00:29:51.666
graphs.

00:29:53.133 --> 00:29:57.000
At the same time, I've already motivated, right?

00:29:57.000 --> 00:30:00.533
motivated, right, but like we we want in

00:30:00.533 --> 00:30:02.433
some problems, in some problems the representations that

00:30:02.433 --> 00:30:07.033
we want to learn or design, they, we

00:30:07.033 --> 00:30:08.833
want them to reflect things like communities and

00:30:08.833 --> 00:30:09.400
bottlenecks.

00:30:09.700 --> 00:30:12.333
One thing that does that are multi-hop

00:30:12.333 --> 00:30:13.033
walks, right?

00:30:13.033 --> 00:30:15.666
So if you take multiple, say, random walks

00:30:15.666 --> 00:30:18.200
of large enough length and you sort of

00:30:18.200 --> 00:30:20.700
average statistics that you collect in these walks,

00:30:21.800 --> 00:30:24.533
you're, in well-behaved graphs, you're able to

00:30:24.533 --> 00:30:27.466
get, each rewire the graph by promoting these

00:30:27.466 --> 00:30:29.866
pairs that are repeatedly showing up in the

00:30:29.866 --> 00:30:33.466
same multi-hop walks.

00:30:36.933 --> 00:30:40.866
So to do this, we're going to use

00:30:40.866 --> 00:30:46.433
this idea of contagion dynamics from from opinion

00:30:46.433 --> 00:30:49.433
dynamics and we're going to use two what

00:30:49.433 --> 00:30:52.333
they call cascade rules for discovering this mesoscopic

00:30:52.333 --> 00:30:56.700
structure, maximum adjacency search and threshold adjacency search.

00:30:57.000 --> 00:30:59.933
So the way they work is they start

00:30:59.933 --> 00:31:01.133
at a random seed.

00:31:03.300 --> 00:31:07.133
Besides activating this random seed at step zero,

00:31:07.233 --> 00:31:09.733
you also activate a random set of nodes

00:31:09.733 --> 00:31:12.833
and then at each step what you're going

00:31:12.833 --> 00:31:16.133
to do is you're going to activate the

00:31:16.133 --> 00:31:19.333
inactive node with the most active neighbours in

00:31:19.333 --> 00:31:20.600
maximum adjacency search.

00:31:21.233 --> 00:31:24.433
You're going to be a bit more forgiving

00:31:24.433 --> 00:31:26.200
with this activation and you're going to activate

00:31:26.200 --> 00:31:28.333
any inactive node that has at least kappa

00:31:28.333 --> 00:31:29.433
active neighbours.

00:31:31.900 --> 00:31:34.333
These rules are just two different rules to

00:31:34.333 --> 00:31:35.233
do the same thing, right?

00:31:35.266 --> 00:31:38.500
Like they're both privileged, these multi-hop pairs

00:31:38.500 --> 00:31:42.233
that are supported by short redundant paths, which

00:31:42.233 --> 00:31:44.400
is this mesoscopic structure that we're trying to

00:31:44.800 --> 00:31:45.833
to extract.

00:31:46.266 --> 00:31:50.033
And so here I just have two videos

00:31:50.033 --> 00:31:56.933
showing threshold cascade with two parameters kappa.

00:31:58.133 --> 00:32:00.533
So here, it's a little fast, but you

00:32:00.533 --> 00:32:02.266
can see I initialise at zero.

00:32:02.833 --> 00:32:05.333
I also sample nodes four and three and

00:32:05.333 --> 00:32:07.733
then I start looking at these maximally adjacent

00:32:07.733 --> 00:32:10.500
adjacent nodes to zero, four, and three and

00:32:10.500 --> 00:32:12.233
I do this repeatedly for a certain number

00:32:12.233 --> 00:32:12.766
of steps.

00:32:13.466 --> 00:32:15.933
This is for small thresholds who end up

00:32:15.933 --> 00:32:17.800
activating a lot of nodes by the end.

00:32:18.000 --> 00:32:19.766
So they end up activating 10 out of

00:32:19.766 --> 00:32:20.500
60 nodes.

00:32:21.100 --> 00:32:22.933
Here, if you increase the threshold a little

00:32:22.933 --> 00:32:25.733
bit, activations are more rare.

00:32:26.133 --> 00:32:29.100
Naturally, you're activating, eventually you're activating the one

00:32:29.100 --> 00:32:29.533
-hop neighbours.

00:32:31.100 --> 00:32:32.833
So you're not getting rid of this local

00:32:32.833 --> 00:32:35.433
structure, but there's some selectivity in how you're

00:32:35.433 --> 00:32:38.400
activating your further hop neighbours, right?

00:32:38.400 --> 00:32:40.200
Your two-hop, three-hop neighbours and so

00:32:40.200 --> 00:32:40.366
on.

00:32:41.200 --> 00:32:44.700
So this selectivity is based on this maximal

00:32:44.700 --> 00:32:46.200
adjacency idea.

00:32:54.700 --> 00:32:55.133
Okay.

00:32:58.433 --> 00:33:01.433
Another small thing here.

00:33:01.666 --> 00:33:05.633
So we retain both, we also so the

00:33:05.633 --> 00:33:07.633
graphs that we extract, these rewired graphs, they're

00:33:07.633 --> 00:33:07.900
weighted.

00:33:08.100 --> 00:33:09.700
So we symmetrise them and we weight them

00:33:09.700 --> 00:33:10.833
by coactivation frequency.

00:33:11.000 --> 00:33:13.300
So for instance, if I don't know, two

00:33:13.300 --> 00:33:16.166
here shows up many multi-hop paths from

00:33:16.166 --> 00:33:16.700
zero, right?

00:33:16.700 --> 00:33:19.700
Like after doing this cascade for a number

00:33:19.700 --> 00:33:22.433
of times, the weight is going to reflect

00:33:22.433 --> 00:33:24.700
that that frequency of appearance.

00:33:25.500 --> 00:33:30.533
And also these auxiliary graphs, they're not as

00:33:30.533 --> 00:33:32.400
not super expensive to build, right?

00:33:32.433 --> 00:33:36.233
Like they're, if m is order of n,

00:33:36.300 --> 00:33:37.733
this is going to be O of n

00:33:37.733 --> 00:33:38.800
to build.

00:33:39.200 --> 00:33:40.733
And then the idea is to take these

00:33:40.733 --> 00:33:43.100
auxiliary graphs and pass them as the backbone

00:33:44.933 --> 00:33:47.433
to a GNN or a graph transformer.

00:33:49.833 --> 00:33:53.633
So we have a couple of theoretical results

00:33:53.633 --> 00:33:56.333
to sort of support this approach.

00:33:57.133 --> 00:34:00.833
One is showing that rewiring helps when you

00:34:00.833 --> 00:34:03.066
have graphs that are heterophilic.

00:34:04.033 --> 00:34:08.566
So here, assume that you have that we

00:34:08.566 --> 00:34:10.133
can write three events, right?

00:34:10.166 --> 00:34:12.866
So the first one is label agreements, meaning

00:34:12.866 --> 00:34:15.000
nodes have the same label in whatever task

00:34:15.000 --> 00:34:16.266
you're trying to solve.

00:34:19.333 --> 00:34:22.166
This will be calligraphic L, event calligraphic L.

00:34:23.100 --> 00:34:25.533
Event calligraphic E will be that there's an

00:34:25.533 --> 00:34:27.533
edge, not a directed edge, just an edge

00:34:27.533 --> 00:34:30.133
between nodes U and V, meaning they're adjacent

00:34:30.133 --> 00:34:30.833
in the graph.

00:34:31.466 --> 00:34:33.400
And calligraphic R will mean that you have

00:34:33.400 --> 00:34:38.566
reinforcement, meaning that there's at least kappa co

00:34:38.566 --> 00:34:40.866
-occurrences of U and V in.

00:34:41.433 --> 00:34:45.966
So given these events and given a random

00:34:45.966 --> 00:34:49.033
unordered pair of distinct nodes U, V, we

00:34:49.033 --> 00:34:52.100
define the standard edge homophily as the probability

00:34:52.100 --> 00:34:55.000
that you have label agreement given that you

00:34:55.000 --> 00:34:57.600
have an edge between U and V, and

00:34:57.600 --> 00:35:00.366
the reinforcement homophily or the homophily in the

00:35:00.366 --> 00:35:03.533
rewired graph as the probability that you have

00:35:03.533 --> 00:35:06.566
label agreement given that you have reinforcement, or

00:35:06.566 --> 00:35:08.266
in other words, given that you have adjacency

00:35:08.266 --> 00:35:09.300
in the rewired graph.

00:35:10.733 --> 00:35:12.500
And then what we were asking here was,

00:35:12.766 --> 00:35:15.733
when is this homophily in the rewired graph

00:35:15.733 --> 00:35:18.466
greater than homophily in the original graph?

00:35:18.766 --> 00:35:20.433
The reason for this is because we know

00:35:20.433 --> 00:35:23.233
that GNN's graph convolutions work well when you

00:35:23.233 --> 00:35:24.100
have homophily, right?

00:35:24.133 --> 00:35:27.666
So when similar nodes, not similar nodes, nodes

00:35:27.666 --> 00:35:30.966
that are connected in the graph have similar

00:35:31.966 --> 00:35:35.033
neighbours, similar labels, or for instance signals.

00:35:35.433 --> 00:35:37.966
This is what Dirichlet energy measures, right?

00:35:37.966 --> 00:35:41.933
More in our talking more in signal processing

00:35:41.933 --> 00:35:42.333
terms.

00:35:43.333 --> 00:35:46.633
So what we show in this proposition here

00:35:46.633 --> 00:35:47.833
is that under two conditions.

00:35:47.833 --> 00:35:51.333
So under the condition that we have sort

00:35:51.333 --> 00:35:55.833
of heterophily in G, meaning that whenever the

00:35:55.833 --> 00:35:57.833
labels between two nodes U and V agree,

00:35:58.333 --> 00:36:00.566
you have a larger probability that they are

00:36:00.566 --> 00:36:03.500
in the multi-hop path versus them being

00:36:03.500 --> 00:36:05.033
direct neighbours.

00:36:05.733 --> 00:36:08.366
And when the reinforcement event is less frequent

00:36:08.366 --> 00:36:12.400
than edges, then the homophily of that rewired

00:36:12.400 --> 00:36:14.533
graph is greater than the homophily of the

00:36:14.533 --> 00:36:15.233
original graph.

00:36:15.833 --> 00:36:17.633
And this is greater than or equal, but

00:36:17.633 --> 00:36:20.033
it will be strict when these conditions are

00:36:20.033 --> 00:36:20.566
strict.

00:36:21.533 --> 00:36:25.866
And here we tried to illustrate this with

00:36:25.866 --> 00:36:27.933
some synthetic and real-world graphs.

00:36:28.400 --> 00:36:30.766
So what we did was we, the homophily

00:36:30.766 --> 00:36:34.333
before and after rewiring, versus the log of

00:36:34.333 --> 00:36:38.866
the original average degree and the sizes of

00:36:38.866 --> 00:36:41.200
the balls here are indicating the homophily.

00:36:41.433 --> 00:36:44.966
So small, like these dots, have low homophily.

00:36:45.166 --> 00:36:47.300
The larger circles have high homophily.

00:36:48.133 --> 00:36:49.166
And so what you see is that we

00:36:49.166 --> 00:36:52.966
get the biggest increase in homophily right over

00:36:52.966 --> 00:36:57.966
here when the original graphs are heterophilic and

00:36:57.966 --> 00:37:00.733
when they have a high degree, right?

00:37:01.400 --> 00:37:05.233
In some cases you, the homophily worsens, like

00:37:05.233 --> 00:37:08.066
especially for very homophilic low degree graphs.

00:37:12.066 --> 00:37:14.800
Okay, and then the other results that we

00:37:14.800 --> 00:37:16.600
have, and this is a very simple result,

00:37:16.633 --> 00:37:19.133
it's very related to what Antonio discussed on

00:37:19.133 --> 00:37:24.700
Monday, the effective resistance of the edges in

00:37:24.700 --> 00:37:27.933
the rewired graph, right?

00:37:28.033 --> 00:37:30.300
So basically, I mean, what we're capturing once

00:37:30.300 --> 00:37:34.233
we do this rewiring based on co-activation,

00:37:34.466 --> 00:37:37.900
right, or nodes appearing in multiple multi-hop

00:37:37.900 --> 00:37:41.900
paths, is we're encoding structural redundancy, right, in

00:37:41.900 --> 00:37:42.300
the graph.

00:37:43.166 --> 00:37:48.433
And basically what that structural redundancy means formally

00:37:48.433 --> 00:37:50.933
is that, then what that means is that

00:37:50.933 --> 00:37:54.533
G has K pairwise edge disjoint short paths

00:37:54.533 --> 00:37:56.900
from the seed to U, right?

00:37:57.033 --> 00:38:03.233
And this directly bounds the effective resistance I

00:38:03.233 --> 00:38:07.633
probably don't need to define this again after

00:38:07.633 --> 00:38:12.033
Antonio's talk, but it's basically just measuring sort

00:38:12.033 --> 00:38:17.966
of like the voltage difference between two nodes,

00:38:18.200 --> 00:38:20.266
right, based on the pseudo-inverse of the

00:38:20.266 --> 00:38:22.300
Laplacian here, if you think that you have

00:38:22.300 --> 00:38:24.733
a current source at U and a current

00:38:24.733 --> 00:38:25.766
sink at V.

00:38:26.733 --> 00:38:28.600
And the upper bound that we get for

00:38:28.600 --> 00:38:32.766
this effective resistance in these rewired edges is

00:38:32.766 --> 00:38:36.533
given by L, which is the length of

00:38:36.533 --> 00:38:40.133
the kappa, where kappa is the activation threshold.

00:38:41.333 --> 00:38:45.000
Right, so essentially here, it's not that we're

00:38:45.000 --> 00:38:48.133
we're not preserving bottlenecks, right, because bottlenecks are

00:38:48.133 --> 00:38:53.633
nodes with, are edges with high effective resistance.

00:38:53.933 --> 00:38:59.000
We're preserving these redundant, so these pairs that

00:38:59.000 --> 00:39:01.800
correspond to redundant, to redundancy in the graph.

00:39:02.100 --> 00:39:06.233
We're preserving, we're connecting nodes with low effective

00:39:06.233 --> 00:39:07.366
resistance.

00:39:09.466 --> 00:39:13.633
Okay, so just a few numerical results before

00:39:13.633 --> 00:39:14.333
I conclude.

00:39:15.266 --> 00:39:17.366
So, and what we saw was, first of

00:39:17.366 --> 00:39:18.800
all, it does relatively well.

00:39:19.033 --> 00:39:24.533
So it's, in some data sets, it achieves,

00:39:24.733 --> 00:39:25.533
like, the best performance.

00:39:25.766 --> 00:39:28.666
In others, it does comparatively with using the

00:39:28.666 --> 00:39:31.233
original graph, which is, which is nice.

00:39:31.400 --> 00:39:33.666
But maybe the more interesting thing is the

00:39:33.666 --> 00:39:34.700
alignment with the theory.

00:39:34.833 --> 00:39:36.533
So we do see that the largest gains

00:39:36.533 --> 00:39:42.066
occur for heterophilic graphs with a high degree.

00:39:42.533 --> 00:39:45.700
When the assumptions of that first theorem on

00:39:45.700 --> 00:39:48.066
the, on the improved homophily of the, of

00:39:48.066 --> 00:39:51.833
the rewired graph don't hold, then we didn't

00:39:51.833 --> 00:39:53.466
observe a lot of gains.

00:39:53.666 --> 00:39:55.433
So this does seem to be more useful

00:39:55.433 --> 00:39:56.633
for heterophilic graphs.

00:39:57.166 --> 00:40:00.333
Although agnostics, so this is just GCN, right,

00:40:00.333 --> 00:40:04.100
and these are three different graph transformer architectures.

00:40:04.633 --> 00:40:06.066
All of these are sort of sparse graph

00:40:06.066 --> 00:40:09.333
transformers where you don't have the graph structure,

00:40:09.466 --> 00:40:13.566
but you sort of give the neighbours embeddings

00:40:13.566 --> 00:40:14.433
as tokens.

00:40:15.133 --> 00:40:16.833
So you can use the, this rewired graph

00:40:16.833 --> 00:40:17.033
there.

00:40:17.133 --> 00:40:19.200
These two here, we had pretty decent improvement.

00:40:20.166 --> 00:40:23.700
And here, there's just, just more detail on

00:40:23.700 --> 00:40:25.533
the improvement that we got on these heterophilic

00:40:25.533 --> 00:40:26.266
data sets, right?

00:40:26.266 --> 00:40:29.100
So especially for Chameleon and Cornell, the improvements

00:40:29.100 --> 00:40:31.233
were, were pretty good.

00:40:33.333 --> 00:40:36.666
Another best accuracy that we got on these

00:40:36.666 --> 00:40:40.766
rewired graphs, on the label homophily, the log

00:40:40.766 --> 00:40:44.733
average degree, and the connectivity of the, of

00:40:44.733 --> 00:40:47.566
both the original and the rewired graph.

00:40:48.866 --> 00:40:51.266
And what we saw is that, so M

00:40:51.266 --> 00:40:52.566
-A-S-T-A-S here are our

00:40:52.566 --> 00:40:53.166
methods, right?

00:40:53.633 --> 00:40:55.233
So the values of the beta here, but

00:40:55.233 --> 00:40:57.666
especially focussing on the R-squared, the adjusted

00:40:57.666 --> 00:41:01.333
R-squared here for, you see that the

00:41:02.000 --> 00:41:05.700
difference in, in, or the test accuracy, the

00:41:05.700 --> 00:41:09.066
gains in test accuracy, are much better explained

00:41:10.566 --> 00:41:13.933
by the structural properties of this rewired graph

00:41:13.933 --> 00:41:15.466
in the original graph, right?

00:41:15.533 --> 00:41:18.233
So there's improvements in, in homophily and this

00:41:18.233 --> 00:41:22.466
change in effective resistance in the rewired graph

00:41:22.466 --> 00:41:24.566
do seem to help performance.

00:41:25.433 --> 00:41:31.766
Okay, so just to conclude, today I talked

00:41:31.766 --> 00:41:37.066
about trying to preserve mesoscopic and global topology

00:41:37.066 --> 00:41:38.666
in, in graphs.

00:41:39.900 --> 00:41:43.333
For approximating global distances, we saw that landmark

00:41:43.333 --> 00:41:46.666
embeddings are a good method, right?

00:41:46.766 --> 00:41:48.800
That sort of still has that local step,

00:41:48.966 --> 00:41:51.333
but then augments that with a, with a

00:41:51.333 --> 00:41:52.000
global step.

00:41:52.766 --> 00:41:55.866
And even though the distortion guarantees for these

00:41:55.866 --> 00:41:58.266
algorithms is pretty bad in the worst case,

00:41:58.733 --> 00:42:01.133
we see that on more average case graphs,

00:42:01.233 --> 00:42:02.033
the distortion is.

00:42:02.266 --> 00:42:04.900
We also saw that there's an interesting dependence

00:42:04.900 --> 00:42:10.266
on the embedding dimension in the Peronic, and

00:42:10.266 --> 00:42:12.466
that we can replace that breadth-first search

00:42:12.466 --> 00:42:15.833
step in, in the local step with a

00:42:15.833 --> 00:42:20.166
GNN surrogate that then allows us to sort

00:42:20.166 --> 00:42:22.333
of leverage the nice properties of GNN such

00:42:22.333 --> 00:42:23.566
as, as transferability.

00:42:24.866 --> 00:42:27.333
And then in the second part, talking about

00:42:28.566 --> 00:42:32.633
embedding or preserving mesoscopic structure, we use graph

00:42:32.633 --> 00:42:33.433
cascades.

00:42:33.433 --> 00:42:37.866
At these global mesoscopic scales, convolutions and attention

00:42:37.866 --> 00:42:40.566
can be insufficient, right?

00:42:40.733 --> 00:42:43.466
So one possible, I mean, there are many

00:42:43.466 --> 00:42:47.100
approaches to these structures, right, to obtain embeddings

00:42:47.100 --> 00:42:49.566
that embed these structures.

00:42:50.866 --> 00:42:53.166
But the approach that we took here, and

00:42:53.166 --> 00:42:55.733
the approach that I, I, I think is,

00:42:56.366 --> 00:42:59.366
you know, I contend is a good approach,

00:42:59.566 --> 00:43:02.200
is to make the use of the appropriate

00:43:02.200 --> 00:43:04.466
metric, right, to try to preserve the appropriate

00:43:04.466 --> 00:43:04.833
metric.

00:43:04.966 --> 00:43:06.833
So in the global scale, the shortest path

00:43:06.833 --> 00:43:09.333
metric, in the local, in the mesoscopic scale,

00:43:09.833 --> 00:43:14.166
things like effective resistance, redundancy, and here we

00:43:14.166 --> 00:43:15.800
try to do this in two ways.

00:43:16.033 --> 00:43:19.433
So first, inside the ML model, right, so

00:43:20.500 --> 00:43:25.300
trying to learn embeddings that, and in the

00:43:25.866 --> 00:43:28.466
in trying to preserve mesoscopic scale, we did

00:43:28.466 --> 00:43:31.233
so outside of the ML model via rewiring,

00:43:31.533 --> 00:43:35.833
right, and existing ML and SP backbones.

00:43:36.733 --> 00:43:39.600
Now for future direction, so related to what

00:43:39.600 --> 00:43:41.333
Sergio asked, one thing that we can do

00:43:41.333 --> 00:43:42.766
is we can do better seed or landmark

00:43:42.766 --> 00:43:43.633
selection, right?

00:43:43.666 --> 00:43:46.366
So typically this is done uniformly at random.

00:43:47.333 --> 00:43:50.066
Our results rely on, but in practise you

00:43:50.066 --> 00:43:54.533
could do better by using degree or centrality,

00:43:55.066 --> 00:43:59.100
use even the cascade structure in these random

00:43:59.100 --> 00:44:01.266
walks to select the landmarks.

00:44:01.766 --> 00:44:05.466
Another direction that we're pursuing is extending those

00:44:05.466 --> 00:44:09.333
shortest path embedding distortion guarantees to directed graphs,

00:44:10.266 --> 00:44:12.633
which is a little bit more challenging because

00:44:12.633 --> 00:44:14.400
now your matrix D is not symmetric and

00:44:14.400 --> 00:44:16.566
you have to make sure you're respecting the

00:44:16.566 --> 00:44:18.066
individual edges orientations.

00:44:20.566 --> 00:44:25.500
Also perhaps doing some signal or feature-aware

00:44:25.500 --> 00:44:26.933
cascade selection, right?

00:44:26.966 --> 00:44:29.633
So we completely ignore the node features when

00:44:29.633 --> 00:44:32.466
we're doing the cascade selection or the when

00:44:32.466 --> 00:44:35.766
we're sort of computing those maximally adjacent node

00:44:35.766 --> 00:44:36.066
pairs.

00:44:36.466 --> 00:44:40.166
And then finally, because we're really focussing on

00:44:40.166 --> 00:44:44.033
this rewiring method on nodes that have some

00:44:44.033 --> 00:44:48.433
structural redundancy at the mesoscopic scale, the method

00:44:48.433 --> 00:44:51.666
ends up not working well for graphs where

00:44:51.666 --> 00:44:54.666
you have bottlenecks or for problems where the

00:44:54.666 --> 00:44:55.666
bottlenecks are important.

00:44:55.933 --> 00:45:00.733
So maybe combining this rewiring method with some

00:45:00.733 --> 00:45:05.200
bottleneck targeted rewiring could give us better gains

00:45:05.200 --> 00:45:09.300
at the mesoscopic scale and allow us to

00:45:09.300 --> 00:45:11.166
cross narrower cuts.

00:45:13.066 --> 00:45:16.266
Thank you, so these are the two papers

00:45:16.266 --> 00:45:20.133
and thank you to my collaborators.

00:45:25.733 --> 00:45:27.233
Thank you.

00:45:29.133 --> 00:45:31.033
Johanna, thank you for the presentation.

00:45:31.833 --> 00:45:33.066
Sorry if I missed this.

00:45:33.200 --> 00:45:36.433
I'm curious about in the results, given that

00:45:36.433 --> 00:45:41.166
graph transformers allegedly are naturally better for heterophilic

00:45:41.166 --> 00:45:46.633
tasks, whether this like model agnostic improvement puts

00:45:46.633 --> 00:45:49.666
GNNs on par to observe a difference.

00:45:50.366 --> 00:45:51.700
Yeah, I don't think I have the raw

00:45:51.700 --> 00:45:54.200
numbers, but transformers were doing better in practise.

00:45:54.433 --> 00:45:56.300
Yeah, like majority of the time.

00:45:56.566 --> 00:45:58.566
Also, we only tested GCN, I think.

00:45:58.700 --> 00:46:00.166
GCN is not very good for these.

00:46:00.533 --> 00:46:02.733
So yeah, that could also be an artefact.

00:46:03.266 --> 00:46:03.566
I see.

00:46:03.866 --> 00:46:04.300
Thank you.

00:46:07.533 --> 00:46:09.733
I also have a question on part one.

00:46:13.433 --> 00:46:17.066
Typically these real networks, they have always missing

00:46:17.066 --> 00:46:17.466
edges.

00:46:17.466 --> 00:46:20.766
They affect the shortest path search, etc.

00:46:21.533 --> 00:46:24.266
What people sometimes do, they still embed them

00:46:24.266 --> 00:46:26.800
in a hyperbolic space or any other space

00:46:26.800 --> 00:46:28.633
and try to figure out there how to

00:46:28.633 --> 00:46:29.566
find shortest paths.

00:46:31.200 --> 00:46:33.666
Do you think that's possible now to extend

00:46:33.666 --> 00:46:33.833
it?

00:46:35.966 --> 00:46:37.866
Answer to is there a shortest path between

00:46:37.866 --> 00:46:40.066
two disconnected nodes is no, there isn't.

00:46:40.166 --> 00:46:42.566
Or like the shortest path has, the shortest

00:46:42.566 --> 00:46:43.466
distance is infinity.

00:46:44.166 --> 00:46:49.566
Right, so we do assume or because, so

00:46:49.566 --> 00:46:51.733
when we do this branching process approximation, we

00:46:51.733 --> 00:46:54.166
assume that we're in the supercritical regime and

00:46:54.166 --> 00:46:56.166
there with high probability you have one giant

00:46:56.166 --> 00:46:57.066
connected component.

00:46:57.533 --> 00:46:58.933
So we leverage that, right?

00:47:01.533 --> 00:47:02.933
It's important, yeah.

00:47:10.266 --> 00:47:13.033
You talked about rewiring there as a way

00:47:13.033 --> 00:47:14.766
to mitigate this and to investigate.

00:47:15.366 --> 00:47:18.600
People also look extensively at virtual nodes.

00:47:18.600 --> 00:47:20.900
Which would be instead of just rewiring on

00:47:20.900 --> 00:47:22.700
the existing nodes, you have one extra node

00:47:22.700 --> 00:47:24.633
and then you can do both rewiring or

00:47:24.633 --> 00:47:25.400
new edges with that.

00:47:26.266 --> 00:47:27.766
And I have a feeling that it will

00:47:27.766 --> 00:47:30.433
really help in some of these aspects.

00:47:32.200 --> 00:47:35.333
Yeah, so actually if I'm not mistaken this

00:47:35.333 --> 00:47:37.133
VCR one, I think the V stands for

00:47:37.133 --> 00:47:37.400
virtual.

00:47:37.533 --> 00:47:39.933
So I think it's one of the graph

00:47:39.933 --> 00:47:41.866
transformers that has this virtual edge.

00:47:42.300 --> 00:47:43.800
So I mean we keep the virtual edge,

00:47:43.900 --> 00:47:46.533
but we just add this rewiring, right?

00:47:46.733 --> 00:47:49.533
And there are some decent improvements still.

00:47:49.733 --> 00:47:52.800
So I guess, yeah, there's, it helps.

00:47:52.800 --> 00:47:54.033
Because the virtual node, if you think about

00:47:54.033 --> 00:47:55.666
it, it's sort of like adding this global

00:47:55.666 --> 00:47:57.700
sort of node that's connecting everybody, right?

00:48:00.433 --> 00:48:02.400
That's true, that's true, yes.

